{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nibpbUnTsxTd"
      },
      "source": [
        "##### Copyright 2018 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "tXAbWHtqs1Y2"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HTgMAvQq-PU_"
      },
      "source": [
        "# Ragged tensors\n",
        "\n",
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://www.tensorflow.org/guide/ragged_tensor\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/ragged_tensor.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/guide/ragged_tensor.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a href=\"https://storage.googleapis.com/tensorflow_docs/docs/site/en/guide/ragged_tensor.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5DP8XNP-6zlu"
      },
      "source": [
        "**API Documentation:** [`tf.RaggedTensor`](https://www.tensorflow.org/api_docs/python/tf/RaggedTensor) [`tf.ragged`](https://www.tensorflow.org/api_docs/python/tf/ragged)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cDIUjj07-rQg"
      },
      "source": [
        "## Setup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "KKvdSorS-pDD"
      },
      "outputs": [],
      "source": [
        "import math\n",
        "import tensorflow as tf"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pxi0m_yf-te5"
      },
      "source": [
        "## Overview\n",
        "\n",
        "Your data comes in many shapes; your tensors should too.\n",
        "*Ragged tensors* are the TensorFlow equivalent of nested variable-length\n",
        "lists. They make it easy to store and process data with non-uniform shapes,\n",
        "including:\n",
        "\n",
        "*   Variable-length features, such as the set of actors in a movie.\n",
        "*   Batches of variable-length sequential inputs, such as sentences or video\n",
        "    clips.\n",
        "*   Hierarchical inputs, such as text documents that are subdivided into\n",
        "    sections, paragraphs, sentences, and words.\n",
        "*   Individual fields in structured inputs, such as protocol buffers.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1mhU_qY3_mla"
      },
      "source": [
        "### What you can do with a ragged tensor\n",
        "\n",
        "Ragged tensors are supported by more than a hundred TensorFlow operations,\n",
        "including math operations (such as `tf.add` and `tf.reduce_mean`), array operations\n",
        "(such as `tf.concat` and `tf.tile`), string manipulation ops (such as\n",
        "`tf.substr`), control flow operations (such as `tf.while_loop` and `tf.map_fn`), and many others:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "vGmJGSf_-PVB"
      },
      "outputs": [],
      "source": [
        "digits = tf.ragged.constant([[3, 1, 4, 1], [], [5, 9, 2], [6], []])\n",
        "words = tf.ragged.constant([[\"So\", \"long\"], [\"thanks\", \"for\", \"all\", \"the\", \"fish\"]])\n",
        "print(tf.add(digits, 3))\n",
        "print(tf.reduce_mean(digits, axis=1))\n",
        "print(tf.concat([digits, [[5, 3]]], axis=0))\n",
        "print(tf.tile(digits, [1, 2]))\n",
        "print(tf.strings.substr(words, 0, 2))\n",
        "print(tf.map_fn(tf.math.square, digits))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Pt-5OIc8-PVG"
      },
      "source": [
        "There are also a number of methods and operations that are\n",
        "specific to ragged tensors, including factory methods, conversion methods,\n",
        "and value-mapping operations.\n",
        "For a list of supported ops, see the **`tf.ragged` package\n",
        "documentation**."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "r8fjGgf3B_6z"
      },
      "source": [
        "Ragged tensors are supported by many TensorFlow APIs, including [Keras](https://www.tensorflow.org/guide/keras), [Datasets](https://www.tensorflow.org/guide/data), [tf.function](https://www.tensorflow.org/guide/function), [SavedModels](https://www.tensorflow.org/guide/saved_model), and [tf.Example](https://www.tensorflow.org/tutorials/load_data/tfrecord).  For more information, see the section on **TensorFlow APIs** below."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aTXLjQlcHP8a"
      },
      "source": [
        "As with normal tensors, you can use Python-style indexing to access specific\n",
        "slices of a ragged tensor. For more information, see the section on\n",
        "**Indexing** below."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "n8YMKXpI-PVH"
      },
      "outputs": [],
      "source": [
        "print(digits[0])       # First row"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Awi8i9q5_DuX"
      },
      "outputs": [],
      "source": [
        "print(digits[:, :2])   # First two values in each row."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "sXgQtTcgHHMR"
      },
      "outputs": [],
      "source": [
        "print(digits[:, -2:])  # Last two values in each row."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6FU5T_-8-PVK"
      },
      "source": [
        "And just like normal tensors, you can use Python arithmetic and comparison\n",
        "operators to perform elementwise operations. For more information, see the section on\n",
        "**Overloaded Operators** below."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2tdUEtb7-PVL"
      },
      "outputs": [],
      "source": [
        "print(digits + 3)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "X-bxG0nc_Nmf"
      },
      "outputs": [],
      "source": [
        "print(digits + tf.ragged.constant([[1, 2, 3, 4], [], [5, 6, 7], [8], []]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2tsw8mN0ESIT"
      },
      "source": [
        "If you need to perform an elementwise transformation to the values of a `RaggedTensor`, you can use `tf.ragged.map_flat_values`, which takes a function plus one or more arguments, and applies the function to transform the `RaggedTensor`'s values."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "pvt5URbdEt-D"
      },
      "outputs": [],
      "source": [
        "times_two_plus_one = lambda x: x * 2 + 1\n",
        "print(tf.ragged.map_flat_values(times_two_plus_one, digits))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HNxF6_QKAzkl"
      },
      "source": [
        "Ragged tensors can be converted to nested Python `list`s and numpy `array`s:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "A5NHb8ViA9dt"
      },
      "outputs": [],
      "source": [
        "digits.to_list()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2o1wogVyA6Yp"
      },
      "outputs": [],
      "source": [
        "digits.numpy()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7M5RHOgp-PVN"
      },
      "source": [
        "### Constructing a ragged tensor\n",
        "\n",
        "The simplest way to construct a ragged tensor is using\n",
        "`tf.ragged.constant`, which builds the\n",
        "`RaggedTensor` corresponding to a given nested Python `list` or numpy `array`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "yhgKMozw-PVP"
      },
      "outputs": [],
      "source": [
        "sentences = tf.ragged.constant([\n",
        "    [\"Let's\", \"build\", \"some\", \"ragged\", \"tensors\", \"!\"],\n",
        "    [\"We\", \"can\", \"use\", \"tf.ragged.constant\", \".\"]])\n",
        "print(sentences)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "TW1g7eE2ee8M"
      },
      "outputs": [],
      "source": [
        "paragraphs = tf.ragged.constant([\n",
        "    [['I', 'have', 'a', 'cat'], ['His', 'name', 'is', 'Mat']],\n",
        "    [['Do', 'you', 'want', 'to', 'come', 'visit'], [\"I'm\", 'free', 'tomorrow']],\n",
        "])\n",
        "print(paragraphs)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SPLn5xHn-PVR"
      },
      "source": [
        "Ragged tensors can also be constructed by pairing flat *values* tensors with\n",
        "*row-partitioning* tensors indicating how those values should be divided into\n",
        "rows, using factory classmethods such as `tf.RaggedTensor.from_value_rowids`,\n",
        "`tf.RaggedTensor.from_row_lengths`, and\n",
        "`tf.RaggedTensor.from_row_splits`.\n",
        "\n",
        "#### `tf.RaggedTensor.from_value_rowids`\n",
        "If you know which row each value belongs in, then you can build a `RaggedTensor` using a `value_rowids` row-partitioning tensor:\n",
        "\n",
        "![value_rowids](https://www.tensorflow.org/images/ragged_tensors/value_rowids.png)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "SEvcPUcl-PVS"
      },
      "outputs": [],
      "source": [
        "print(tf.RaggedTensor.from_value_rowids(\n",
        "    values=[3, 1, 4, 1, 5, 9, 2],\n",
        "    value_rowids=[0, 0, 0, 0, 2, 2, 3]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RBQh8sYc-PVV"
      },
      "source": [
        "#### `tf.RaggedTensor.from_row_lengths`\n",
        "\n",
        "If you know how long each row is, then you can use a `row_lengths` row-partitioning tensor:\n",
        "\n",
        "![row_lengths](https://www.tensorflow.org/images/ragged_tensors/row_lengths.png)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "LBY81WXl-PVW"
      },
      "outputs": [],
      "source": [
        "print(tf.RaggedTensor.from_row_lengths(\n",
        "    values=[3, 1, 4, 1, 5, 9, 2],\n",
        "    row_lengths=[4, 0, 2, 1]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8p5V8_Iu-PVa"
      },
      "source": [
        "#### `tf.RaggedTensor.from_row_splits`\n",
        "\n",
        "If you know the index where each row starts and ends, then you can use a `row_splits` row-partitioning tensor:\n",
        "\n",
        "![row_splits](https://www.tensorflow.org/images/ragged_tensors/row_splits.png)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "FwizuqZI-PVb"
      },
      "outputs": [],
      "source": [
        "print(tf.RaggedTensor.from_row_splits(\n",
        "    values=[3, 1, 4, 1, 5, 9, 2],\n",
        "    row_splits=[0, 4, 4, 6, 7]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "E-9imo8DhwuA"
      },
      "source": [
        "See the `tf.RaggedTensor` class documentation for a full list of factory methods.\n",
        "\n",
        "Note: By default, these factory methods add assertions that the row partition tensor is well-formed and consistent with the number of values.  The `validate=False` parameter can be used to skip these checks if you can guarantee that the inputs are well-formed and consistent."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YQAOsT1_-PVg"
      },
      "source": [
        "### What you can store in a ragged tensor\n",
        "\n",
        "As with normal `Tensor`s, the values in a `RaggedTensor` must all have the same\n",
        "type; and the values must all be at the same nesting depth (the *rank* of the\n",
        "tensor):"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "SqbPBd_w-PVi"
      },
      "outputs": [],
      "source": [
        "print(tf.ragged.constant([[\"Hi\"], [\"How\", \"are\", \"you\"]]))  # ok: type=string, rank=2"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "83ZCSJnQAWAf"
      },
      "outputs": [],
      "source": [
        "print(tf.ragged.constant([[[1, 2], [3]], [[4, 5]]]))        # ok: type=int32, rank=3"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ewA3cISdDfmP"
      },
      "outputs": [],
      "source": [
        "try:\n",
        "  tf.ragged.constant([[\"one\", \"two\"], [3, 4]])              # bad: multiple types\n",
        "except ValueError as exception:\n",
        "  print(exception)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "EOWIlVidDl-n"
      },
      "outputs": [],
      "source": [
        "try:\n",
        "  tf.ragged.constant([\"A\", [\"B\", \"C\"]])                     # bad: multiple nesting depths\n",
        "except ValueError as exception:\n",
        "  print(exception)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nhHMFhSp-PVq"
      },
      "source": [
        "## Example use case\n",
        "\n",
        "The following example demonstrates how `RaggedTensor`s can be used to construct\n",
        "and combine unigram and bigram embeddings for a batch of variable-length\n",
        "queries, using special markers for the beginning and end of each sentence.\n",
        "For more details on the ops used in this example, see the `tf.ragged` package documentation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ZBs_V7e--PVr"
      },
      "outputs": [],
      "source": [
        "queries = tf.ragged.constant([['Who', 'is', 'Dan', 'Smith'],\n",
        "                              ['Pause'],\n",
        "                              ['Will', 'it', 'rain', 'later', 'today']])\n",
        "\n",
        "# Create an embedding table.\n",
        "num_buckets = 1024\n",
        "embedding_size = 4\n",
        "embedding_table = tf.Variable(\n",
        "    tf.random.truncated_normal([num_buckets, embedding_size],\n",
        "                       stddev=1.0 / math.sqrt(embedding_size)))\n",
        "\n",
        "# Look up the embedding for each word.\n",
        "word_buckets = tf.strings.to_hash_bucket_fast(queries, num_buckets)\n",
        "word_embeddings = tf.nn.embedding_lookup(embedding_table, word_buckets)     # ①\n",
        "\n",
        "# Add markers to the beginning and end of each sentence.\n",
        "marker = tf.fill([queries.nrows(), 1], '#')\n",
        "padded = tf.concat([marker, queries, marker], axis=1)                       # ②\n",
        "\n",
        "# Build word bigrams & look up embeddings.\n",
        "bigrams = tf.strings.join([padded[:, :-1], padded[:, 1:]], separator='+')   # ③\n",
        "\n",
        "bigram_buckets = tf.strings.to_hash_bucket_fast(bigrams, num_buckets)\n",
        "bigram_embeddings = tf.nn.embedding_lookup(embedding_table, bigram_buckets) # ④\n",
        "\n",
        "# Find the average embedding for each sentence\n",
        "all_embeddings = tf.concat([word_embeddings, bigram_embeddings], axis=1)    # ⑤\n",
        "avg_embedding = tf.reduce_mean(all_embeddings, axis=1)                      # ⑥\n",
        "print(avg_embedding)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Y_lE_LAVcWQH"
      },
      "source": [
        "![ragged_example](https://www.tensorflow.org/images/ragged_tensors/ragged_example.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "An_k0pX1-PVt"
      },
      "source": [
        "## Ragged and uniform dimensions\n",
        "\n",
        "A ***ragged dimension*** is a dimension whose slices may have different lengths. For example, the\n",
        "inner (column) dimension of `rt=[[3, 1, 4, 1], [], [5, 9, 2], [6], []]` is\n",
        "ragged, since the column slices (`rt[0, :]`, ..., `rt[4, :]`) have different\n",
        "lengths. Dimensions whose slices all have the same length are called *uniform\n",
        "dimensions*.\n",
        "\n",
        "The outermost dimension of a ragged tensor is always uniform, since it consists\n",
        "of a single slice (and so there is no possibility for differing slice\n",
        "lengths).  The remaining dimensions may be either ragged or uniform. For\n",
        "example, we might store the word embeddings for\n",
        "each word in a batch of sentences using a ragged tensor with shape\n",
        "`[num_sentences, (num_words), embedding_size]`, where the parentheses around\n",
        "`(num_words)` indicate that the dimension is ragged.\n",
        "\n",
        "![sent_word_embed](https://www.tensorflow.org/images/ragged_tensors/sent_word_embed.png)\n",
        "\n",
        "Ragged tensors may have multiple ragged dimensions. For example, we could store\n",
        "a batch of structured text documents using a tensor with shape `[num_documents,\n",
        "(num_paragraphs), (num_sentences), (num_words)]` (where again parentheses are\n",
        "used to indicate ragged dimensions).\n",
        "\n",
        "As with `tf.Tensor`, the ***rank*** of a ragged tensor is its total number of dimensions (including both ragged and uniform dimensions).\n",
        "A ***potentially ragged tensor*** is a value that might be\n",
        "either a `tf.Tensor` or a `tf.RaggedTensor`.\n",
        "\n",
        "When describing the shape of a RaggedTensor, ragged dimensions are conventionally indicated by\n",
        "enclosing them in parentheses. For example, as we saw above, the shape of a 3-D\n",
        "RaggedTensor that stores word embeddings for each word in a batch of sentences\n",
        "can be written as `[num_sentences, (num_words), embedding_size]`.\n",
        "\n",
        "The `RaggedTensor.shape` attribute returns a `tf.TensorShape` for a\n",
        "ragged tensor, where ragged dimensions have size `None`:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "M2Wzx4JEIvmb"
      },
      "outputs": [],
      "source": [
        "tf.ragged.constant([[\"Hi\"], [\"How\", \"are\", \"you\"]]).shape"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G9tfJOeFlijE"
      },
      "source": [
        "The method `tf.RaggedTensor.bounding_shape` can be used to find a tight\n",
        "bounding shape for a given `RaggedTensor`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5DHaqXHxlWi0"
      },
      "outputs": [],
      "source": [
        "print(tf.ragged.constant([[\"Hi\"], [\"How\", \"are\", \"you\"]]).bounding_shape())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "V8e7x95UcLS6"
      },
      "source": [
        "## Ragged vs. sparse\n",
        "\n",
        "A ragged tensor should *not* be thought of as a type of sparse tensor.  In particular, sparse tensors are *efficient encodings for tf.Tensor*, that model the same data in a compact format; but ragged tensor is an *extension to tf.Tensor*, that models an expanded class of data.  This difference is crucial when defining operations:\n",
        "\n",
        "* Applying an op to a sparse or dense tensor should always give the same result.\n",
        "* Applying an op to a ragged or sparse tensor may give different results.\n",
        "\n",
        "As an illustrative example, consider how array operations such as `concat`,\n",
        "`stack`, and `tile` are defined for ragged vs. sparse tensors. Concatenating\n",
        "ragged tensors joins each row to form a single row with the combined length:\n",
        "\n",
        "![ragged_concat](https://www.tensorflow.org/images/ragged_tensors/ragged_concat.png)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ush7IGUWLXIn"
      },
      "outputs": [],
      "source": [
        "ragged_x = tf.ragged.constant([[\"John\"], [\"a\", \"big\", \"dog\"], [\"my\", \"cat\"]])\n",
        "ragged_y = tf.ragged.constant([[\"fell\", \"asleep\"], [\"barked\"], [\"is\", \"fuzzy\"]])\n",
        "print(tf.concat([ragged_x, ragged_y], axis=1))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pvQzZG8zMoWa"
      },
      "source": [
        "But concatenating sparse tensors is equivalent to concatenating the corresponding dense tensors,\n",
        "as illustrated by the following example (where Ø indicates missing values):\n",
        "\n",
        "![sparse_concat](https://www.tensorflow.org/images/ragged_tensors/sparse_concat.png)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "eTIhGayQL0gI"
      },
      "outputs": [],
      "source": [
        "sparse_x = ragged_x.to_sparse()\n",
        "sparse_y = ragged_y.to_sparse()\n",
        "sparse_result = tf.sparse.concat(sp_inputs=[sparse_x, sparse_y], axis=1)\n",
        "print(tf.sparse.to_dense(sparse_result, ''))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Vl8eQN8pMuYx"
      },
      "source": [
        "For another example of why this distinction is important, consider the\n",
        "definition of “the mean value of each row” for an op such as `tf.reduce_mean`.\n",
        "For a ragged tensor, the mean value for a row is the sum of the\n",
        "row’s values divided by the row’s width.\n",
        "But for a sparse tensor, the mean value for a row is the sum of the\n",
        "row’s values divided by the sparse tensor’s overall width (which is\n",
        "greater than or equal to the width of the longest row).\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "u4yjxcK7IPXc"
      },
      "source": [
        "## TensorFlow APIs"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VoZGwFQjIYU5"
      },
      "source": [
        "### Keras\n",
        "\n",
        "[tf.keras](https://www.tensorflow.org/guide/keras) is TensorFlow's high-level API for building and training deep learning models. Ragged tensors may be passed as inputs to a Keras model by setting `ragged=True` on `tf.keras.Input` or `tf.keras.layers.InputLayer`.  Ragged tensors may also be passed between Keras layers, and returned by Keras models.  The following example shows a toy LSTM model that is trained using ragged tensors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "pHls7hQVJlk5"
      },
      "outputs": [],
      "source": [
        "# Task: predict whether each sentence is a question or not.\n",
        "sentences = tf.constant(\n",
        "    ['What makes you think she is a witch?',\n",
        "     'She turned me into a newt.',\n",
        "     'A newt?',\n",
        "     'Well, I got better.'])\n",
        "is_question = tf.constant([True, False, True, False])\n",
        "\n",
        "# Preprocess the input strings.\n",
        "hash_buckets = 1000\n",
        "words = tf.strings.split(sentences, ' ')\n",
        "hashed_words = tf.strings.to_hash_bucket_fast(words, hash_buckets)\n",
        "\n",
        "# Build the Keras model.\n",
        "keras_model = tf.keras.Sequential([\n",
        "    tf.keras.layers.Input(shape=[None], dtype=tf.int64, ragged=True),\n",
        "    tf.keras.layers.Embedding(hash_buckets, 16),\n",
        "    tf.keras.layers.LSTM(32, use_bias=False),\n",
        "    tf.keras.layers.Dense(32),\n",
        "    tf.keras.layers.Activation(tf.nn.relu),\n",
        "    tf.keras.layers.Dense(1)\n",
        "])\n",
        "\n",
        "keras_model.compile(loss='binary_crossentropy', optimizer='rmsprop')\n",
        "keras_model.fit(hashed_words, is_question, epochs=5)\n",
        "print(keras_model.predict(hashed_words))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8B_sdlt6Ij61"
      },
      "source": [
        "### tf.Example\n",
        "\n",
        "[tf.Example](https://www.tensorflow.org/tutorials/load_data/tfrecord) is a standard [protobuf](https://developers.google.com/protocol-buffers/) encoding for TensorFlow data.  Data encoded with `tf.Example`s often includes variable-length features.  For example, the following code defines a batch of four `tf.Example` messages with different feature lengths:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "xsiglYM7TXGr"
      },
      "outputs": [],
      "source": [
        "import google.protobuf.text_format as pbtext\n",
        "\n",
        "def build_tf_example(s):\n",
        "  return pbtext.Merge(s, tf.train.Example()).SerializeToString()\n",
        "\n",
        "example_batch = [\n",
        "  build_tf_example(r'''\n",
        "    features {\n",
        "      feature {key: \"colors\" value {bytes_list {value: [\"red\", \"blue\"]} } }\n",
        "      feature {key: \"lengths\" value {int64_list {value: [7]} } } }'''),\n",
        "  build_tf_example(r'''\n",
        "    features {\n",
        "      feature {key: \"colors\" value {bytes_list {value: [\"orange\"]} } }\n",
        "      feature {key: \"lengths\" value {int64_list {value: []} } } }'''),\n",
        "  build_tf_example(r'''\n",
        "    features {\n",
        "      feature {key: \"colors\" value {bytes_list {value: [\"black\", \"yellow\"]} } }\n",
        "      feature {key: \"lengths\" value {int64_list {value: [1, 3]} } } }'''),\n",
        "  build_tf_example(r'''\n",
        "    features {\n",
        "      feature {key: \"colors\" value {bytes_list {value: [\"green\"]} } }\n",
        "      feature {key: \"lengths\" value {int64_list {value: [3, 5, 2]} } } }''')]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "szUuXFvtUL2o"
      },
      "source": [
        "We can parse this encoded data using `tf.io.parse_example`, which takes a tensor of serialized strings and a feature specification dictionary, and returns a dictionary mapping feature names to tensors.  To read the variable-length features into ragged tensors, we simply use `tf.io.RaggedFeature` in the feature specification dictionary:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "xcdaIbYVT4mo"
      },
      "outputs": [],
      "source": [
        "feature_specification = {\n",
        "    'colors': tf.io.RaggedFeature(tf.string),\n",
        "    'lengths': tf.io.RaggedFeature(tf.int64),\n",
        "}\n",
        "feature_tensors = tf.io.parse_example(example_batch, feature_specification)\n",
        "for name, value in feature_tensors.items():\n",
        "  print(\"{}={}\".format(name, value))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IK9X_8rXVr8h"
      },
      "source": [
        "`tf.io.RaggedFeature` can also be used to read features with multiple ragged dimensions.  For details, see the [API documentation](https://www.tensorflow.org/api_docs/python/tf/io/RaggedFeature)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UJowRhlxIX0R"
      },
      "source": [
        "### Datasets\n",
        "\n",
        "[tf.data](https://www.tensorflow.org/guide/data) is an API that enables you to build complex input pipelines from simple, reusable pieces.  Its core data structure is `tf.data.Dataset`, which represents a sequence of elements, in which each element consists of one or more components. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "fBml1m2G2vO9"
      },
      "outputs": [],
      "source": [
        "# Helper function used to print datasets in the examples below.\n",
        "def print_dictionary_dataset(dataset):\n",
        "  for i, element in enumerate(dataset):\n",
        "    print(\"Element {}:\".format(i))\n",
        "    for (feature_name, feature_value) in element.items():\n",
        "      print('{:>14} = {}'.format(feature_name, feature_value))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gEu_H1Sp2jz1"
      },
      "source": [
        "#### Building Datasets with ragged tensors\n",
        "\n",
        "Datasets can be built from ragged tensors using the same methods that are used to build them from `tf.Tensor`s or numpy `array`s, such as `Dataset.from_tensor_slices`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "BuelF_y2mEq9"
      },
      "outputs": [],
      "source": [
        "dataset = tf.data.Dataset.from_tensor_slices(feature_tensors)\n",
        "print_dictionary_dataset(dataset)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mC-QNkJc56De"
      },
      "source": [
        "Note: `Dataset.from_generator` does not support ragged tensors yet, but support will be added soon."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "K0UKvBLf1VMu"
      },
      "source": [
        "#### Batching and unbatching Datasets with ragged tensors\n",
        "\n",
        "Datasets with ragged tensors can be batched (which combines *n* consecutive elements into a single elements) using the `Dataset.batch` method."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "lk62aRz63IZn"
      },
      "outputs": [],
      "source": [
        "batched_dataset = dataset.batch(2)\n",
        "print_dictionary_dataset(batched_dataset)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NLSGiYEQ5A8N"
      },
      "source": [
        "Conversely, a batched dataset can be transformed into a flat dataset using `Dataset.unbatch`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "CxLlaPw_5Je4"
      },
      "outputs": [],
      "source": [
        "unbatched_dataset = batched_dataset.unbatch()\n",
        "print_dictionary_dataset(unbatched_dataset)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YzpLQFh33q0N"
      },
      "source": [
        "#### Batching Datasets with variable-length non-ragged tensors\n",
        "\n",
        "If you have a Dataset that contains non-ragged tensors, and tensor lengths vary across elements, then you can batch those non-ragged tensors into ragged tensors by applying the `dense_to_ragged_batch` transformation:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "PYnhERwh3_mf"
      },
      "outputs": [],
      "source": [
        "non_ragged_dataset = tf.data.Dataset.from_tensor_slices([1, 5, 3, 2, 8])\n",
        "non_ragged_dataset = non_ragged_dataset.map(tf.range)\n",
        "batched_non_ragged_dataset = non_ragged_dataset.apply(\n",
        "    tf.data.experimental.dense_to_ragged_batch(2))\n",
        "for element in batched_non_ragged_dataset:\n",
        "  print(element)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nXFPeE-CzJ-s"
      },
      "source": [
        "#### Transforming Datasets with ragged tensors\n",
        "\n",
        "Ragged tensors in Datasets can also be created or transformed using `Dataset.map`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Ios1GuG-pf9U"
      },
      "outputs": [],
      "source": [
        "def transform_lengths(features):\n",
        "  return {\n",
        "      'mean_length': tf.math.reduce_mean(features['lengths']),\n",
        "      'length_ranges': tf.ragged.range(features['lengths'])}\n",
        "transformed_dataset = dataset.map(transform_lengths)\n",
        "print_dictionary_dataset(transformed_dataset)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WD2lWw3fIXrg"
      },
      "source": [
        "### tf.function\n",
        "\n",
        "[tf.function](https://www.tensorflow.org/guide/function) is a decorator that precomputes TensorFlow graphs for Python functions, which can substantially improve the performance of your TensorFlow code.  Ragged tensors can be used transparently with `@tf.function`-decorated functions.  For example, the following function works with both ragged and non-ragged tensors:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "PfyxgVaj_8tl"
      },
      "outputs": [],
      "source": [
        "@tf.function\n",
        "def make_palindrome(x, axis):\n",
        "  return tf.concat([x, tf.reverse(x, [axis])], axis)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "vcZdzvEnDEt0"
      },
      "outputs": [],
      "source": [
        "make_palindrome(tf.constant([[1, 2], [3, 4], [5, 6]]), axis=1)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4WfCMIgdDMxj"
      },
      "outputs": [],
      "source": [
        "make_palindrome(tf.ragged.constant([[1, 2], [3], [4, 5, 6]]), axis=1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "X2p69YPOBUz8"
      },
      "source": [
        "If you wish to explicitly specify the `input_signature` for the `tf.function`, then you can do so using `tf.RaggedTensorSpec`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "k6-hkhdDBk6G"
      },
      "outputs": [],
      "source": [
        "@tf.function(\n",
        "    input_signature=[tf.RaggedTensorSpec(shape=[None, None], dtype=tf.int32)])\n",
        "def max_and_min(rt):\n",
        "  return (tf.math.reduce_max(rt, axis=-1), tf.math.reduce_min(rt, axis=-1))\n",
        "\n",
        "max_and_min(tf.ragged.constant([[1, 2], [3], [4, 5, 6]]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fSs-7E0VD85q"
      },
      "source": [
        "#### Concrete functions\n",
        "\n",
        "[Concrete functions](https://www.tensorflow.org/guide/function#obtaining_concrete_functions) encapsulate individual traced graphs that are built by `tf.function`. Ragged tensors can be used transparently with concrete functions.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "yyJeXJ4wFWox"
      },
      "outputs": [],
      "source": [
        "@tf.function\n",
        "def increment(x):\n",
        "  return x + 1\n",
        "\n",
        "rt = tf.ragged.constant([[1, 2], [3], [4, 5, 6]])\n",
        "cf = increment.get_concrete_function(rt)\n",
        "print(cf(rt))\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iYLyPlatIXhh"
      },
      "source": [
        "### SavedModels\n",
        "\n",
        "A [SavedModel](https://www.tensorflow.org/guide/saved_model) is a serialized TensorFlow program, including both weights and computation.  It can be built from a Keras model or from a custom model.  In either case, ragged tensors can be used transparently with the functions and methods defined by a SavedModel.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "98VpBSdOgWqL"
      },
      "source": [
        "#### Example: saving a Keras model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "D-Dg9w7Je5pU"
      },
      "outputs": [],
      "source": [
        "import tempfile\n",
        "\n",
        "keras_module_path = tempfile.mkdtemp()\n",
        "tf.saved_model.save(keras_model, keras_module_path)\n",
        "imported_model = tf.saved_model.load(keras_module_path)\n",
        "imported_model(hashed_words)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9-7k-E92gaoR"
      },
      "source": [
        "#### Example: saving a custom model\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Sfem1ESrdGzX"
      },
      "outputs": [],
      "source": [
        "class CustomModule(tf.Module):\n",
        "  def __init__(self, variable_value):\n",
        "    super(CustomModule, self).__init__()\n",
        "    self.v = tf.Variable(variable_value)\n",
        "\n",
        "  @tf.function\n",
        "  def grow(self, x):\n",
        "    return x * self.v\n",
        "\n",
        "module = CustomModule(100.0)\n",
        "\n",
        "# Before saving a custom model, we must ensure that concrete functions are\n",
        "# built for each input signature that we will need.\n",
        "module.grow.get_concrete_function(tf.RaggedTensorSpec(shape=[None, None],\n",
        "                                                      dtype=tf.float32))\n",
        "\n",
        "custom_module_path = tempfile.mkdtemp()\n",
        "tf.saved_model.save(module, custom_module_path)\n",
        "imported_model = tf.saved_model.load(custom_module_path)\n",
        "imported_model.grow(tf.ragged.constant([[1.0, 4.0, 3.0], [2.0]]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SAxis5KBhrBN"
      },
      "source": [
        "Note: SavedModel [signatures](https://www.tensorflow.org/guide/saved_model#specifying_signatures_during_export) are concrete functions.  As discussed in the section on Concrete Functions above, ragged tensors are only handled correctly by concrete functions starting with TensorFlow 2.3.  If you need to use SavedModel signatures in a previous version of TensorFlow, then we recommend decomposing the ragged tensor into its component tensors."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cRcHzS6pcHYC"
      },
      "source": [
        "## Overloaded operators\n",
        "\n",
        "The `RaggedTensor` class overloads the standard Python arithmetic and comparison\n",
        "operators, making it easy to perform basic elementwise math:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "skScd37P-PVu"
      },
      "outputs": [],
      "source": [
        "x = tf.ragged.constant([[1, 2], [3], [4, 5, 6]])\n",
        "y = tf.ragged.constant([[1, 1], [2], [3, 3, 3]])\n",
        "print(x + y)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XEGgbZHV-PVw"
      },
      "source": [
        "Since the overloaded operators perform elementwise computations, the inputs to\n",
        "all binary operations must have the same shape, or be broadcastable to the same\n",
        "shape. In the simplest broadcasting case, a single scalar is combined\n",
        "elementwise with each value in a ragged tensor:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "IYybEEWc-PVx"
      },
      "outputs": [],
      "source": [
        "x = tf.ragged.constant([[1, 2], [3], [4, 5, 6]])\n",
        "print(x + 3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "okGb9dIi-PVz"
      },
      "source": [
        "For a discussion of more advanced cases, see the section on\n",
        "**Broadcasting**.\n",
        "\n",
        "Ragged tensors overload the same set of operators as normal `Tensor`s: the unary\n",
        "operators `-`, `~`, and `abs()`; and the binary operators `+`, `-`, `*`, `/`,\n",
        "`//`, `%`, `**`, `&`, `|`, `^`, `==`, `<`, `<=`, `>`, and `>=`.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f2anbs6ZnFtl"
      },
      "source": [
        "## Indexing\n",
        "\n",
        "Ragged tensors support Python-style indexing, including multidimensional\n",
        "indexing and slicing. The following examples demonstrate ragged tensor indexing\n",
        "with a 2-D and a 3-D ragged tensor."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XuEwmC3t_ITL"
      },
      "source": [
        "### Indexing examples: 2D ragged tensor"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "MbSRZRDz-PV1"
      },
      "outputs": [],
      "source": [
        "queries = tf.ragged.constant(\n",
        "    [['Who', 'is', 'George', 'Washington'],\n",
        "     ['What', 'is', 'the', 'weather', 'tomorrow'],\n",
        "     ['Goodnight']])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2HRs2xhh-vZE"
      },
      "outputs": [],
      "source": [
        "print(queries[1])                   # A single query"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "EFfjZV7YA3UH"
      },
      "outputs": [],
      "source": [
        "print(queries[1, 2])                # A single word"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VISRPQSdA3xn"
      },
      "outputs": [],
      "source": [
        "print(queries[1:])                  # Everything but the first row"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "J1PpSyKQBMng"
      },
      "outputs": [],
      "source": [
        "print(queries[:, :3])               # The first 3 words of each query"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ixrhHmJBeidy"
      },
      "outputs": [],
      "source": [
        "print(queries[:, -2:])              # The last 2 words of each query"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cnOP6Vza-PV4"
      },
      "source": [
        "### Indexing examples 3D ragged tensor"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "8VbqbKcE-PV6"
      },
      "outputs": [],
      "source": [
        "rt = tf.ragged.constant([[[1, 2, 3], [4]],\n",
        "                         [[5], [], [6]],\n",
        "                         [[7]],\n",
        "                         [[8, 9], [10]]])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "f9WPVWf4grVp"
      },
      "outputs": [],
      "source": [
        "print(rt[1])                        # Second row (2-D RaggedTensor)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ad8FGJoABjQH"
      },
      "outputs": [],
      "source": [
        "print(rt[3, 0])                     # First element of fourth row (1-D Tensor)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "MPPr-a-bBjFE"
      },
      "outputs": [],
      "source": [
        "print(rt[:, 1:3])                   # Items 1-3 of each row (3-D RaggedTensor)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "6SIDeoIUBi4z"
      },
      "outputs": [],
      "source": [
        "print(rt[:, -1:])                   # Last item of each row (3-D RaggedTensor)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_d3nBh1GnWvU"
      },
      "source": [
        "`RaggedTensor`s supports multidimensional indexing and slicing, with one\n",
        "restriction: indexing into a ragged dimension is not allowed. This case is\n",
        "problematic because the indicated value may exist in some rows but not others.\n",
        "In such cases, it's not obvious whether we should (1) raise an `IndexError`; (2)\n",
        "use a default value; or (3) skip that value and return a tensor with fewer rows\n",
        "than we started with. Following the\n",
        "[guiding principles of Python](https://www.python.org/dev/peps/pep-0020/)\n",
        "(\"In the face\n",
        "of ambiguity, refuse the temptation to guess\" ), we currently disallow this\n",
        "operation."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IsWKETULAJbN"
      },
      "source": [
        "## Tensor type conversion\n",
        "\n",
        "The `RaggedTensor` class defines methods that can be used to convert\n",
        "between `RaggedTensor`s and `tf.Tensor`s or `tf.SparseTensors`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "INnfmZGcBoU_"
      },
      "outputs": [],
      "source": [
        "ragged_sentences = tf.ragged.constant([\n",
        "    ['Hi'], ['Welcome', 'to', 'the', 'fair'], ['Have', 'fun']])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "__iJ4iXtkGOx"
      },
      "outputs": [],
      "source": [
        "# RaggedTensor -> Tensor\n",
        "print(ragged_sentences.to_tensor(default_value='', shape=[None, 10]))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "-rfiyYqne8QN"
      },
      "outputs": [],
      "source": [
        "# Tensor -> RaggedTensor\n",
        "x = [[1, 3, -1, -1], [2, -1, -1, -1], [4, 5, 8, 9]]\n",
        "print(tf.RaggedTensor.from_tensor(x, padding=-1))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "41WAZLXNnbwH"
      },
      "outputs": [],
      "source": [
        "#RaggedTensor -> SparseTensor\n",
        "print(ragged_sentences.to_sparse())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "S8MkYo2hfVhj"
      },
      "outputs": [],
      "source": [
        "# SparseTensor -> RaggedTensor\n",
        "st = tf.SparseTensor(indices=[[0, 0], [2, 0], [2, 1]],\n",
        "                     values=['a', 'b', 'c'],\n",
        "                     dense_shape=[3, 3])\n",
        "print(tf.RaggedTensor.from_sparse(st))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qx025sNMkAHH"
      },
      "source": [
        "## Evaluating ragged tensors\n",
        "\n",
        "To access the values in a ragged tensor, you can:\n",
        "\n",
        "1.  Use `tf.RaggedTensor.to_list()` to convert the ragged tensor to a\n",
        "    nested python list.\n",
        "1.  Use `tf.RaggedTensor.numpy()` to convert the ragged tensor to a numpy array \n",
        "    whose values are nested numpy arrays.\n",
        "1.  Decompose the ragged tensor into its components, using the\n",
        "    `tf.RaggedTensor.values` and `tf.RaggedTensor.row_splits`\n",
        "    properties, or row-paritioning methods such as \n",
        "    `tf.RaggedTensor.row_lengths()` and `tf.RaggedTensor.value_rowids()`.\n",
        "1.  Use Python indexing to select values from the ragged tensor.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "uMm1WMkc-PV_"
      },
      "outputs": [],
      "source": [
        "rt = tf.ragged.constant([[1, 2], [3, 4, 5], [6], [], [7]])\n",
        "print(\"python list:\", rt.to_list())\n",
        "print(\"numpy array:\", rt.numpy())\n",
        "print(\"values:\", rt.values.numpy())\n",
        "print(\"splits:\", rt.row_splits.numpy())\n",
        "print(\"indexed value:\", rt[1].numpy())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EdljbNPq-PWS"
      },
      "source": [
        "## Broadcasting\n",
        "\n",
        "Broadcasting is the process of making tensors with different shapes have\n",
        "compatible shapes for elementwise operations. For more background on\n",
        "broadcasting, see:\n",
        "\n",
        "*   [Numpy: Broadcasting](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)\n",
        "*   `tf.broadcast_dynamic_shape`\n",
        "*   `tf.broadcast_to`\n",
        "\n",
        "The basic steps for broadcasting two inputs `x` and `y` to have compatible\n",
        "shapes are:\n",
        "\n",
        "1.  If `x` and `y` do not have the same number of dimensions, then add outer\n",
        "    dimensions (with size 1) until they do.\n",
        "\n",
        "2.  For each dimension where `x` and `y` have different sizes:\n",
        "\n",
        "    *   If `x` or `y` have size `1` in dimension `d`, then repeat its values\n",
        "        across dimension `d` to match the other input's size.\n",
        "\n",
        "    *   Otherwise, raise an exception (`x` and `y` are not broadcast\n",
        "        compatible).\n",
        "\n",
        "Where the size of a tensor in a uniform dimension is a single number (the size\n",
        "of slices across that dimension); and the size of a tensor in a ragged dimension\n",
        "is a list of slice lengths (for all slices across that dimension)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-S2hOUWx-PWU"
      },
      "source": [
        "### Broadcasting examples"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0n095XdR-PWU"
      },
      "outputs": [],
      "source": [
        "# x       (2D ragged):  2 x (num_rows)\n",
        "# y       (scalar)\n",
        "# result  (2D ragged):  2 x (num_rows)\n",
        "x = tf.ragged.constant([[1, 2], [3]])\n",
        "y = 3\n",
        "print(x + y)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0SVYk5AP-PWW"
      },
      "outputs": [],
      "source": [
        "# x         (2d ragged):  3 x (num_rows)\n",
        "# y         (2d tensor):  3 x          1\n",
        "# Result    (2d ragged):  3 x (num_rows)\n",
        "x = tf.ragged.constant(\n",
        "   [[10, 87, 12],\n",
        "    [19, 53],\n",
        "    [12, 32]])\n",
        "y = [[1000], [2000], [3000]]\n",
        "print(x + y)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "MsfBMD80s8Ux"
      },
      "outputs": [],
      "source": [
        "# x      (3d ragged):  2 x (r1) x 2\n",
        "# y      (2d ragged):         1 x 1\n",
        "# Result (3d ragged):  2 x (r1) x 2\n",
        "x = tf.ragged.constant(\n",
        "    [[[1, 2], [3, 4], [5, 6]],\n",
        "     [[7, 8]]],\n",
        "    ragged_rank=1)\n",
        "y = tf.constant([[10]])\n",
        "print(x + y)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "rEj5QVfnva0t"
      },
      "outputs": [],
      "source": [
        "# x      (3d ragged):  2 x (r1) x (r2) x 1\n",
        "# y      (1d tensor):                    3\n",
        "# Result (3d ragged):  2 x (r1) x (r2) x 3\n",
        "x = tf.ragged.constant(\n",
        "    [\n",
        "        [\n",
        "            [[1], [2]],\n",
        "            [],\n",
        "            [[3]],\n",
        "            [[4]],\n",
        "        ],\n",
        "        [\n",
        "            [[5], [6]],\n",
        "            [[7]]\n",
        "        ]\n",
        "    ],\n",
        "    ragged_rank=2)\n",
        "y = tf.constant([10, 20, 30])\n",
        "print(x + y)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uennZ64Aqftb"
      },
      "source": [
        "Here are some examples of shapes that do not broadcast:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "UpI0FlfL4Eim"
      },
      "outputs": [],
      "source": [
        "# x      (2d ragged): 3 x (r1)\n",
        "# y      (2d tensor): 3 x    4  # trailing dimensions do not match\n",
        "x = tf.ragged.constant([[1, 2], [3, 4, 5, 6], [7]])\n",
        "y = tf.constant([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])\n",
        "try:\n",
        "  x + y\n",
        "except tf.errors.InvalidArgumentError as exception:\n",
        "  print(exception)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "qGq1zOT4zMoc"
      },
      "outputs": [],
      "source": [
        "# x      (2d ragged): 3 x (r1)\n",
        "# y      (2d ragged): 3 x (r2)  # ragged dimensions do not match.\n",
        "x = tf.ragged.constant([[1, 2, 3], [4], [5, 6]])\n",
        "y = tf.ragged.constant([[10, 20], [30, 40], [50]])\n",
        "try:\n",
        "  x + y\n",
        "except tf.errors.InvalidArgumentError as exception:\n",
        "  print(exception)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "CvLae5vMqeji"
      },
      "outputs": [],
      "source": [
        "# x      (3d ragged): 3 x (r1) x 2\n",
        "# y      (3d ragged): 3 x (r1) x 3  # trailing dimensions do not match\n",
        "x = tf.ragged.constant([[[1, 2], [3, 4], [5, 6]],\n",
        "                        [[7, 8], [9, 10]]])\n",
        "y = tf.ragged.constant([[[1, 2, 0], [3, 4, 0], [5, 6, 0]],\n",
        "                        [[7, 8, 0], [9, 10, 0]]])\n",
        "try:\n",
        "  x + y\n",
        "except tf.errors.InvalidArgumentError as exception:\n",
        "  print(exception)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "m0wQkLfV-PWa"
      },
      "source": [
        "## RaggedTensor encoding\n",
        "\n",
        "Ragged tensors are encoded using the `RaggedTensor` class. Internally, each\n",
        "`RaggedTensor` consists of:\n",
        "\n",
        "*   A `values` tensor, which concatenates the variable-length rows into a\n",
        "    flattened list.\n",
        "*   A `row_partition`, which indicates how those flattened values are divided\n",
        "    into rows.\n",
        "\n",
        "![ragged_encoding_2](https://www.tensorflow.org/images/ragged_tensors/ragged_encoding_2.png)\n",
        "\n",
        "The `row_partition` can be stored using four different encodings:\n",
        "\n",
        "*   `row_splits` is an integer vector specifying the split points between rows.\n",
        "*   `value_rowids` is an integer vector specifying the row index for each value.\n",
        "*   `row_lengths` is an integer vector specifying the length of each row.\n",
        "*   `uniform_row_length` is an integer scalar specifying a single length for\n",
        "    all rows.\n",
        "\n",
        "![partition_encodings](https://www.tensorflow.org/images/ragged_tensors/partition_encodings.png)\n",
        "\n",
        "An integer scalar `nrows` can also be included in the `row_partition` encoding, to account for empty trailing rows with `value_rowids`, or empty rows with `uniform_row_length`.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "MrLgMu0gPuo-"
      },
      "outputs": [],
      "source": [
        "rt = tf.RaggedTensor.from_row_splits(\n",
        "    values=[3, 1, 4, 1, 5, 9, 2],\n",
        "    row_splits=[0, 4, 4, 6, 7])\n",
        "print(rt)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wEfZOKwN1Ra_"
      },
      "source": [
        "The choice of which encoding to use for row partitions is managed internally by ragged tensors, to improve efficiency in some contexts.  In particular, some of the advantages and disadvantages of the different row-partitioning\n",
        "schemes are:\n",
        "\n",
        "+ **Efficient indexing**:\n",
        "    The `row_splits` encoding enables\n",
        "    constant-time indexing and slicing into ragged tensors.\n",
        "\n",
        "+ **Efficient concatenation**:\n",
        "   The `row_lengths` encoding is more efficient when concatenating ragged\n",
        "    tensors, since row lengths do not change when two tensors are concatenated\n",
        "   together.\n",
        "\n",
        "+ **Small encoding size**:\n",
        "    The `value_rowids` encoding is more efficient when storing ragged tensors \n",
        "    that have a large number of empty rows, since the size of the tensor\n",
        "    depends only on the total number of values. On the other hand, the\n",
        "    `row_splits` and `row_lengths` encodings\n",
        "    are more efficient when storing ragged tensors with longer rows, since they\n",
        "    require only one scalar value for each row.\n",
        "\n",
        "+ **Compatibility**:\n",
        "    The `value_rowids` scheme matches the\n",
        "    [segmentation](https://www.tensorflow.org/api_docs/python/tf/math#about_segmentation)\n",
        "    format used by operations such as `tf.segment_sum`. The `row_limits` scheme\n",
        "    matches the format used by ops such as `tf.sequence_mask`.\n",
        "\n",
        "+ **Uniform dimensions**:\n",
        "    As discussed below, the `uniform_row_length` encoding is used to encode\n",
        "    ragged tensors with uniform dimensions."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bpB7xKoUPtU6"
      },
      "source": [
        "### Multiple ragged dimensions\n",
        "\n",
        "A ragged tensor with multiple ragged dimensions is encoded by using a nested\n",
        "`RaggedTensor` for the `values` tensor. Each nested `RaggedTensor` adds a single\n",
        "ragged dimension.\n",
        "\n",
        "![ragged_rank_2](https://www.tensorflow.org/images/ragged_tensors/ragged_rank_2.png)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "yy3IGT2a-PWb"
      },
      "outputs": [],
      "source": [
        "rt = tf.RaggedTensor.from_row_splits(\n",
        "    values=tf.RaggedTensor.from_row_splits(\n",
        "        values=[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],\n",
        "        row_splits=[0, 3, 3, 5, 9, 10]),\n",
        "    row_splits=[0, 1, 1, 5])\n",
        "print(rt)\n",
        "print(\"Shape: {}\".format(rt.shape))\n",
        "print(\"Number of partitioned dimensions: {}\".format(rt.ragged_rank))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5HqEEDzk-PWc"
      },
      "source": [
        "The factory function `tf.RaggedTensor.from_nested_row_splits` may be used to construct a\n",
        "RaggedTensor with multiple ragged dimensions directly, by providing a list of\n",
        "`row_splits` tensors:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "AKYhtFcT-PWd"
      },
      "outputs": [],
      "source": [
        "rt = tf.RaggedTensor.from_nested_row_splits(\n",
        "    flat_values=[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],\n",
        "    nested_row_splits=([0, 1, 1, 5], [0, 3, 3, 5, 9, 10]))\n",
        "print(rt)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BqAfbkAC56m0"
      },
      "source": [
        "### Ragged rank and flat values\n",
        "\n",
        "A ragged tensor's ***ragged rank*** is the number of times that the underlying\n",
        "`values` Tensor has been partitioned (i.e., the nesting depth of  `RaggedTensor` objects).  The innermost `values` tensor is known as its ***flat_values***.  In the following example, `conversations` has ragged_rank=3, and its `flat_values` is a 1D `Tensor` with 24 strings:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "BXp-Tt2bClem"
      },
      "outputs": [],
      "source": [
        "# shape = [batch, (paragraph), (sentence), (word)]\n",
        "conversations = tf.ragged.constant(\n",
        "    [[[[\"I\", \"like\", \"ragged\", \"tensors.\"]],\n",
        "      [[\"Oh\", \"yeah?\"], [\"What\", \"can\", \"you\", \"use\", \"them\", \"for?\"]],\n",
        "      [[\"Processing\", \"variable\", \"length\", \"data!\"]]],\n",
        "     [[[\"I\", \"like\", \"cheese.\"], [\"Do\", \"you?\"]],\n",
        "      [[\"Yes.\"], [\"I\", \"do.\"]]]])\n",
        "conversations.shape"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DZUMrgxXFd5s"
      },
      "outputs": [],
      "source": [
        "assert conversations.ragged_rank == len(conversations.nested_row_splits)\n",
        "conversations.ragged_rank  # Number of partitioned dimensions."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "xXLSNpS0Fdvp"
      },
      "outputs": [],
      "source": [
        "conversations.flat_values.numpy()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uba2EnAY-PWf"
      },
      "source": [
        "### Uniform inner dimensions\n",
        "\n",
        "Ragged tensors with uniform inner dimensions are encoded by using a\n",
        "multidimensional `tf.Tensor` for the flat_values (i.e., the innermost `values`).\n",
        "\n",
        "![uniform_inner](https://www.tensorflow.org/images/ragged_tensors/uniform_inner.png)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "z2sHwHdy-PWg"
      },
      "outputs": [],
      "source": [
        "rt = tf.RaggedTensor.from_row_splits(\n",
        "    values=[[1, 3], [0, 0], [1, 3], [5, 3], [3, 3], [1, 2]],\n",
        "    row_splits=[0, 3, 4, 6])\n",
        "print(rt)\n",
        "print(\"Shape: {}\".format(rt.shape))\n",
        "print(\"Number of partitioned dimensions: {}\".format(rt.ragged_rank))\n",
        "print(\"Flat values shape: {}\".format(rt.flat_values.shape))\n",
        "print(\"Flat values:\\n{}\".format(rt.flat_values))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WoGRKd50x_qz"
      },
      "source": [
        "### Uniform non-inner dimensions\n",
        "\n",
        "Ragged tensors with uniform non-inner dimensions are encoded by partitioning rows with `uniform_row_length`.\n",
        "\n",
        "![uniform_outer](https://www.tensorflow.org/images/ragged_tensors/uniform_outer.png)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "70q1aCKwySgS"
      },
      "outputs": [],
      "source": [
        "rt = tf.RaggedTensor.from_uniform_row_length(\n",
        "    values=tf.RaggedTensor.from_row_splits(\n",
        "        values=[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],\n",
        "        row_splits=[0, 3, 5, 9, 10]),\n",
        "    uniform_row_length=2)\n",
        "print(rt)\n",
        "print(\"Shape: {}\".format(rt.shape))\n",
        "print(\"Number of partitioned dimensions: {}\".format(rt.ragged_rank))"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "ragged_tensor.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
